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In  this  note  we  present  an  algorithm  for  calculating  the  connected 
components  of  a  binary  image.  While  not  as  fast  as  the  logarithmic-time 
connected  components  known  for  massively  parallel  computing  systems  (e.g. 
1  processor  per  pixel)  connected  in  shuffle-exchange  or  other  similar  patterns 
(see  Shiloach  and  Vishkin  [1982]),  the  algorithm  to  be  presented  is  fast 
enough  to  handle  images  in  real  time  and  simple  enough  to  allow  an  easy  and 
very  economical  hardware  implementation.  This  improves  prior  work 
reported  in  Ronse  and  Devijer  [1984],  sec  also  Lumia,  Shapiro  and  Zuniga 
[1983]. 

In  what  follows,  we  shall  suppose  that  F  is  a  digitized  image  having  mn 
pixels  arranged  in  a  n-row  by  m-column  raster.  Each  pixel  is  either  0  or  1, 
which  we  may  also  call  'black'  or  'white'.  Let  S  denote  the  set  of  all  1-pixels. 
The  task  at  hand  is  to  decompose  S  into  its  connected  components.  More 
precisely,  we  wish  to  assign  a  unique  number  to  each  component,  and  to 
assign  each  white  pixel  its  connected  component  number,  so  that  we  can 
perform  various  simple  operations  on  the  image,  such  as  displaying  it  with 
each  connected  component  shown  in  a  different  color,  or  computing  various 
moments  of  each  component,  etc.  Since  m  amd  n  can  be  fairly  large,  we  do 
not  want  to  store  an  image  in  which  each  pixel  is  explicitly  assigned  an 
identifying  region  number.  Rather,  we  prefer  to  represent  this  output  image 
(i.e.  the  logical  result  of  our  connected-components  algorithm)  implicitly,  by 
building  an  auxiliary  data  structure  of  relatively  small  size,  and  by  using  that 
structure  to  compute  the  component  numbers  on  the  fly  for  each  pixel. 

The  linear-time,  i.e.  0(mn),  algorithm  to  be  presented  makes  two  passes 
over  the  binary  image,  processing  rows  of  pixels  one  after  the  other.  The 
first  pass  is  bottom-to-top,  the  second  top-to-bottom.    Up  to  the  point  at 
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which  the  algorithm  begins  to  piriGiTiSe-  ^  final  output  (a  stream  of 
component  numbers,  one  per  white  pixel,  prodticeJ  in  serial  order  of  pixels), 
the  amount  of  auxiliary  storage  requL'sd  (ig  addition  to  a  frame  buffer)  is 
just  one  additional  bit  per  pixel,  plus  a  small  amount  of  high  speed  storage 
proportional  to  m.  Generating  the  numbers  rapidly  requires  a  high  speed 
store  of  m/2  -bit  words,  if  unique  c>:»'inponent  r umbers  are  to  be  generated  for 
a  worst  case,  unenhanced,  image.  Ehtripg  neither  pass  over  the  image  is  it 
necessary  to  access  the  data  for  more  than  twc  adjacent  rows  simultaneously. 
Processing  of  individual  rows  imolves  only  stack-like  operations,  proceeding 
first  left-to-right,  then  right-to- left.  Thus  the  algorithm  needs  only  a  simple 
form  of  pushdown-stack  storage  faking  higb-speed  hardware  realization 
convenient.  ^    ,       x^    ^^y.  _ 

The  Algorithm 

The  following  defin  tions  q?ij0t^,,  th?.  (relatively  straightforward) 
invariants  central  to  our  connected  components  algorithm: 

Definition  1:  Let  p  be  the  iT^d^^ySf^a^ffna^e  row  R^,  so  that  1  ^  p  -^  m. 
Suppose  for  simplicity  (of  definiticn'_(gi  k^OM^  that  two  additional  0-pixels,  one 
on  the  extreme  left,  the  other  on  *^s  ^t^piejfijejfight,  are  added  to  each  row  of  the 

(a)  A  run  of  R^  is  an  unbroken  sequence  of  1 -pixels  of  R^  bounded  on  both 
sides  by  a  0-pixel. 

(b)  The  lower  semi-image  I^  defined  by  p  consists  of  the  union  of  all  rows 

Rj  with  p  s  j  s  m. 

(c)  Gp  is  defined  to  be  a  partition  of  the  set  of  runs  in  R^,  so  that  two 
runs  aire  in  the  same  partition  group  g  (■  G  if  and  only  if  they  belong  to  the 
same  connected  component  of  the  lower  semi-image  /p. 

Our  algorithm  is  defined  by  a  simple  update  rule  that  allows  the  groups 
in  Gp  to  be  calculated  inductively  during  a  bottom-to-top  sweep  through  the 
rows,  and  by  an  even  simpler  rule  that  allows  co-membership  in  the  same 
connected  component  of  the  full  image  /^  to  be  calculated  inductively  for 
successive  rows  R^  during  a  top-to-bottom  sweep. 

The  precise  relationship  with  which  we  will  work  is  defined  as  follows: 

Definition  2:  For  each  row  index  p  we  assign  a  bracket  marking  to  each 
run  r  of  Rp  as  follows: 

The  marking  assigned  is  either  one  of  the  symbols  [,  ]  or  is  one  of  the 
symbol  pairs  {  ]  or  J  [. 

(a)  If  r  is  the  leftmost  (resp.  rightmost)  run  in  its  group  g,  it  is  assigned  the 
mark  [  (resp.  J).  If  r  is  both  the  leftmost  and  rightmost  run  in  g  (so  that  g 
consists  only  of  the  single  run  r)  then  r  is  assigned  the  marking  [  J. 


(b)    If  r  lies  between  thie  lefimpst  and  rightmost  run  in  its  group,   it  is 
assigned  the  marking  fjih^T:^  .[■■ 

Etefinition  2  associate*  d  single  symbol  ([,],  [  ],  or  ]  [)  with  each 
individual  run  in  R^  and  ^^imjiriy  by  concatenating  these  in  the  left-to-right 
order  of  runs  within  R^  we  associate  a  bracket  sequence  with  the  whole  row 
R  .  We  will  observe  just  below  that  this  run-to-bracket-marking  association 
can  be  stored  in  a  very  couipact  form;  suppose  for  the  moment  that  this  has 
been  done.  Then  the  follp\\^ing  Simple' lemma  hints  at  the  way  in  which  we 
wUl  want  to  proceed:         ^^^"       f^t'  ^ ''^^'•~- 

Lemma  1:  (a)  The  brackei'^quence  Iffiat-iHie  preceeding  definitions  associate 
with  the  row  R^  is  properly  nested,  i.e.  each  right  bracket  in  it  is  matched  (by 
the  well-known  simple  stacking  algorithm)  to  an  associated  left  bracket  and  vice 
versa. 

(b)  The  groups  into  whfdi  w^^l^e  divided  the  set  of  all  runs  in  R^  can 
be  reconstructed  from  the  bra^ei^iequeilce  associated  with  R^  by  applying 
the  following  rule:  put  all  fiinSWhosfc-^  associated  brackets  match  into  one 
group.  (Note  that  according'  t<:r  thi^  ralcruns  with  the  ']  ['  marking  will  link 
certain  runs  to  their  left  and  certaia^fUBs t>n  their  right  into  a  single  group.) 

Lemma  1  is  easily  proved,  as  follows.  It  is  plain  from  Definition  2  that 
the  set  of  brackets  associated  wiffrthd  runs  of  a  single  group  of  any  R^  form 
an  unnested  collection,  and  that  all  the  runs  of  such  a  group  are  linked 
together  by  these  brackets  in  the  manner  stated  in  Lemma  1  (b).  Thus  we 
have  only  to  show  that  the  brackets  associated  with  different  groups  are 
properly  nested  within  one  another.  The  following  lemma  asserts  this. 

Lemma  2:  Let  g  and  g'  be  distinct  groups  of  runs  in  G^.  Then  if  there  are 
runs  ;C|  xj  in  g,  which  are  to  the  left  and  to  the  right,  respectively,  of  some  run 
x'  in  g' ,  it  follows  that  all  the  runs  in  g'  lie  between  x^  and  xj- 

Proof:  Since  x^  and  x,  belong  to  the  same  connected  component  of  the 
semi-image  l^,  they  can  be  connected  by  a  simple  arc  A  lying  in  /  .  Complete 
A  to  a  simple  closed  curve  C  by  connecting  its  end  points  by  a  simple  arc  A' 
lying  above  I^.  Since  the  whole  component  of  /^  containing  g'  is  disjoint  from 
C  it  must  lie  inside  C,  so  that  plainly  all  of  g'  must  lie  between  x^  and  xj  in  the 
row  /?p.   Q.E.D. 

At  this  point  in  our  argument  we  plainly  want  a  rule  telling  us  how  to 
calculate  the  groups  (or,  equivalently,  the  bracket  marking)  for  the  row  R^.  ^ 
given  the  same  information  for  R^.  Our  aim  is  just  to  determine  which  runs  r 
of  /?p  _  i  have  other  runs  in  the  same  group  which  lie  to  their  left.  Note  then 
that  a  run  r  in  ^Rp  _  ^  has  a  run  belonging  to  the  same  group  lying  to  its  left 
(resp.  right)  if  it  touches  a  run  of  R^  belonging  to  a  group  g'  one  of  whose 
elements  touches  some  run  f  of  R^_  ^  lying  to  the  left  (resp.  right)  of  r.  This 


makes  it  easy  to  calculate  groups  of  /J^  _  ^  from  those  of  R^  via  two  scans,  one 
left-to-right,  the  other  right-to-left.  '-  ^^  ^°^ 

We  now  detail  the  first  (left-to-right)  x)fgtbese  scans,  which  identifies  all 
those  runs  r  in  /?p  _  i  that  belong  to  groups  G  containing  runs  r'  lying  to  the 
left  of  r.  We  scan  the  rows  R^^  and  R^  toi^ether,  and  simultaneously  scan 
the  brackets  assigned  to  runs  in  R^.  An  a'::rUhry  stack  5  is  used  to  store  left 
brackets  discovered  during  the  scan  of  R^  that  have  not  yet  been  matched.  If 
a  run  r  in  ^p  is  currently  being  sr.anne:d  then  the  bracket  on  top  of  5  will 
describe  the  group  containing  r,  Stacl—H  bi'ckets  have  two  mark  fields: 
grouphit  and  old.  The  bracket  od  top  of  S  v/i'  Iiave  its  grouphit  mark  set  to 
1  whenever,  during  the  scan  of  ^p  _  ^  a  pixel  iri  R^  is  discovered  to  be  adjacent 
to  a  pixel  in  R^  _  |.  This  reccv^s  Ttfe$^;-^^a^Q  that  some  run  in  the  group 
represented  by  the  top  brack^rt  [zi^q-j!a^m§^^c^[  5  brackets)  is  known  to  be 
adjacent  to  some  run  of  R^  _  ,.  Th?  o|<J;jB^^34Jst'.r.gnishes  between  the  case  in 
which  a  group  g  of  R^  represented  t^b^-^^^jsked  bracket  with  grouphit  ==  1 
only  has  pixels  adjacent  to  the  mn  in  ^^^.^  j^that's  currently  being  scanned  (in 
which  case  old  =  0),  from  the  contrar^.^asc  in  which  g  is  adjacent  to  a  run  in 
/?p  _  1  that  has  already  been  scantie^  (i]DLWlSch  case  old  =  1). 

The  start  of  each  run  r'  of  ^.  p^ghos .^-f ".sociated  left  bracket  on  5  if  the 
marking  of  r'  is  either  [  or  [  ],  snd  thfec^cj^c'  /'  pops  the  top  bracket  of  5  if  r' 
is  marked  either  ]  or  [  ].  The  rr^-ki'Dg  ]  [  is  handled  most  efficently  by 
regarding  it  as  a  'no-op'  which  simply  continues  the  bracket  currently  on  the 
stack.  Whenever  two  adjacent  white  bits,  belonging  to  runs  r  f.  R  ^  ^  and 
r'  e  /?p  are  seen,  we  check  whether  the  top  bracket  in  the  stack  is  grouphit  and 
old.  If  this  is  the  case,  r  must  be  connected  to  some  run  that  belongs  to  R^  _  ^, 
and  that  run  lies  to  the  left  of  r.  If  not,  the  bracket's  old  mark  must  be  zero, 
since  runs  in  r"  group  lying  to  the  left  of  r'  do  not  contact  ^p  _  ^;  but  in  this 
case  the  grouphit  mark  of  this  same  bracket  will  already  be  set. 

When  the  last  pixel  of  a  run  r  in  /?p  _  ^  is  encoimtered,  the  bracket  on  the 
top  of  the  stack  can  be  marked  old,  since  any  recorded  grouphits  come  from 
fully  scanned  runs.  Actually,  we  would  like  to  mark  all  the  stacked  brackets 
as  old,  not  just  the  top  one.  This  is  easily  done  indirectly  by  marking  the  top 
two  brackets  as  old,  and  making  the  second  of  these  two  old  marks  'sticky' 
during  a  pop,  so  that  the  second  old  mark  is  set  whenever  the  top  is,  and  if 
set,  stays  set  during  pop  operations. 

The  subsequent  right-to-left  scan  performs  exactly  the  mirror  image  of 
these  actions,  i.e.  determines  what  runs  r  of  R.-i  arc  part  of  a  group 
extending  to  their  right.    This  gives  us  the  bracket  marking  associated  with 

^p  -  1- 

Storage  details  for  the  stack  are  simple.  Since,  during  the  left-to-right 
(resp.  right-to-left)  sweep,  every  item  on  the  stack  5  is  a  left  (resp.  right) 


:   di  mo-'i 
bracket,  these  items  need  not  be  stored  at  all!    However,  we  must  store  the 

old  and  grouphit  bits  that  the  above  procedure  associates  with  these  brackets; 

this  requires  only  two  bti^per^tack  entry.    Alternatively,  a  counter  can  be 

used  to  record  the  depth' d?  BidTiighest  'sticky'  old  mark  in  the  stack  (i.e. 

highest  set  old  bit  having  depfth  2  or  more).   In  such  an  implementation,  the 

stack  will  need  only  one  bit, per  entry. 

The  2-bit  bracket  codek  defining  the  G^  must,  of  course,  be  stored  for 
subsequent  use  by  the  top-tb-1)0tt6tii-paSs.  However,  this  can-be  done  using 
only  one  extra  bit  per  pixe^l'  iHti^ed,  each  run  (except  for  the  leftmost  in 
each  row)  must  have  a  biJick  pixel  '16  its  left,  so  that  runs  (with  one 
exception)  are  at  least  tw*^  fclts  ifing.  Thus  the  bracket  encodings  can  be 
stored  in  a  frame  buffer  (xxt&pzis^g  tHfe  bit  per  pixel),  and  in  standard 
locations  associated  with'6tife^df'^hii^ri^'  Constituting  a  group.  For  example, 
we  can  store  2-bit  code  id- 'Ceta^felcffli^^Mt  positions,  just  before  and  at  the 
start  of  the  leftmost  ran  ofeadf'^eupf^'  b5"  - 

At  the  end  of  the  bottdof^io-to^  sweep,  the  first  row  R^  will  have  been 
processed  and  its  bracket  m'afkiBg'Wfil^clfetermine  G;,  i.e.  J?/s  set  of  connected 
component  groups  as  defined'^by  thtt'  wtfbtc  image  I^.  We  then  perform  a 
top-to-bottom  pass  which  coitipleteSiiihe'assignment  of  connected  component 
numbers.  To  process  the  gTOtpdi^:xfS^Gi  after  R^  _  ^  has  been  processed,  we 
simply  apply  the  following  rule:  fetfj^^  run  in  g  touches  any  run  r'  c/?p_i, 
assign  each  pixel  of  g  the  same'Oomponent  number  as  that  assigned  to  /•'. 
Otherwise  g  represents  a  new  component  of  /f,  assign  a  new  component 
number  to  its  pixels.  To  do  this,  we  perform  a  simultaneous  left-to-right 
pass  over  R^  and  R^  _  ^  during  which  the  parenthesis  marking  of  R  drives  the 
stacking-unstacking  procedure  previously  described;  during  this  process  each 
stacked  bracket  must  be  marked  either  with  a  zero  (indicating  no  contact 
yet),  or  with  a  nonzero  integer  defining  a  component  number.  While  this  is 
being  done,  a  queue  giving  the  component  number  for  each  run  of  ^  i  must 
be  available.  Component  numbers  can  be  stored  at  the  end  of  the  final  run 
of  the  corresponding  group  of  R^. 

The  reader  can  easily  work  out  the  finer  details  of  this  algorithm. 

Additional  Remarks;  Applications.  Since  at  least  two  pixels  (counting 
necessary  black  separators)  arc  involved  in  each  run,  the  queue  for  the 
component  number  computations  described  above  'need  store  a  distinct 
component  number  for  at  most  of  half  the  pixels  in  a  row.  Furthermore,  the 
number  of  distinct  components  is  at  most  half  the  pixel  count  in  the  whole 
raster.  Thus,  for  example,  to  process  a  512  x  512  image,  256  x  19  bits  of 
storage  are  required,  if  each  component  in  a  worst  case  "checker  board"  of 
alternating  black  and  white  pixels  is  to  have  a  unique  number.  The  algorithm 
also  requires  a  stack  of  the  same  size,  since  stacked  brackets  are  marked  with 
component  numbers.    Though  a  variety  of  current  technologies  can  support 
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such  a  high  speed  storage  requirement,  it  should  be  noted  that  for  many 
applications,  a  considerablely  smaller  queue  would  suffice.  For  example,  a 
simple  digital  filter  can  be  used  to  ejgnjq^  many  small  (uninteresting) 
components.  The  moment  calculations^we  now^  t^escribe  can  also  be  used  for 
this  purpose. 

In  some  applications,  it  will  be  useful  to  identify  the  k  largest 
components  and/or  to  calculate  various  additive  geometric  invariants  of  these 
components,  e.g.  their  number  of  pixels,  medians  and  second  moments. 
These  computations  can  be  performed  ii'  a  variety  of  ways.  One  method  is  to 
compute  these  values  on  the  fly  during  the  bottom-to-top  (i.e.  first)  pass. 
Intermediate  values  are  stored  in  the  saxae  stack-based  ways  as  the  brackets  in 
the  foregoing  algorithm.  To  handle  the  case  in  which  a  component  sphts  into 
several  runs  during  a  transition  fidMndofc  tow  to  the  next,  running  totals 
should  be  passed  to  just  one  of-'lB^'^eiSffSniiiiig  branches,  and  the  other 
branches  should  be  given  initial  totals.  Terminating  branches  must  have  their 
values  transferred  to  continuing  branches,  should  any  exist.  The  stack 
organization  described  above  ac^mpfi^he*'  this  quite  easily.  When  all 
branches  of  a  component  tcr:aihii6,  -Ahif  moment  computation  for  that 
component  is  complete.  Use  of  aa  ap^oppriate  component  numbering  scheme 
allows  other  operations,  e.g.  extraction  of  the  *  largest  components,  to  be 
performed  rapidly.  -'-^     -ti; 
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